Applicability domain for classification problems

نویسندگان

  • Iurii Sushko
  • Sergii Novotarskyi
  • Anil Kumar Pandey
  • Robert Körner
  • Igor V. Tetko
چکیده

Classification models are frequent in QSAR modeling. It is of crucial importance to provide good accuracy estimation for classification. Applicability domain provides additional information to identify which compounds are classified with best accuracy and which are expected to have poor and unreliable predictions. The selection of the most reliable predictions can dramatically improve performance of methods while decreasing coverage of predictions [1]. In binary classification problems, labels for machine learning methods are discrete {-1, 1}. Nonetheless, model usually yields prediction that is continuous. Most apparent metrics for accuracy estimation is distance between prediction point and edge of a class, i.e. the more is the distance between prediction the edge of the class, the more reliable and accurate is the prediction of given compound. This metric has been already used in several previous studies (e.g., [2]) and demonstrated good separation of reliable and non-reliable classifications. In quantitative predictions, the standard deviation of ensemble predictions has been found as the most accurate measure distance in a recent benchmarking [3]. We propose to integrate both metrics. Rather than giving a point estimate, this approach provides us with a probability distribution of finding particular compound in one of the classes. Suggested metrics is probability

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sample-oriented Domain Adaptation for Image Classification

Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...

متن کامل

Efficiency of different measures for defining the applicability domain of classification models

The goal of defining an applicability domain for a predictive classification model is to identify the region in chemical space where the model's predictions are reliable. The boundary of the applicability domain is defined with the help of a measure that shall reflect the reliability of an individual prediction. Here, the available measures are differentiated into those that flag unusual object...

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Modified Fixed Grid Finite Element Method in the Analysis of 2D Linear Elastic Problems

In this paper, a modification on the fixed grid finite element method is presented and used in the solution of 2D linear elastic problems. This method uses non-boundary-fitted meshes for the numerical solution of partial differential equations. Special techniques are required to apply boundary conditions on the intersection of domain boundaries and non-boundary-fitted elements. Hence, a new met...

متن کامل

Solving infinite horizon optimal control problems of nonlinear interconnected large-scale dynamic systems via a Haar wavelet collocation scheme

We consider an approximation scheme using Haar wavelets for solving a class of infinite horizon optimal control problems (OCP's) of nonlinear interconnected large-scale dynamic systems. A computational method based on Haar wavelets in the time-domain is proposed for solving the optimal control problem. Haar wavelets integral operational matrix and direct collocation method are utilized to find ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2010